Computing Similarity between RNA Strings
نویسندگان
چکیده
Ribonucleic acid (RNA) strings are strings over the four-letter alphabet {A, C, G, U} with a secondary structure of base-pairing between A U and C G pairs in the string 1 . Edges are drawn between two bases that are paired in the secondary structure and these edges have traditionally been assumed to be noncrossing. The noncrossing base-pairing naturally leads to a tree-like representation of the secondary structure of RNA strings. In this paper, we address several notions of similarity between two RNA strings that take into account both the primary sequence and secondary base-palring structure of the strings. We present efficient algorithms for exact matching and approximate matching between two RNA strings. We define a notion of alignment between two RNA strings and devise algorithms based on dynamic programming. We then present a method for optimally aligning a given RNA string with unknown secondary structure to one with known sequence and structure, thus attacking the structure prediction problem in the case when the structure of a closely related sequence is known. The techniques employed to prove our results include reductions to well-known string matching problems allowing wild cards and ranges, and speeding up dynamic programming by using the tree structures implicit in the secondary structure of RNA strings.
منابع مشابه
Computing Similarity between Rna Strings 1
Ribonucleic acid (RNA) strings are strings over the four-letter alphabet fA; C; G; Ug with a secondary structure of base-pairing between A 0 U and C 0 G pairs in the string. Edges are drawn between two bases that are paired in the secondary structure and these edges have traditionally been assumed to be noncrossing. The noncrossing base-pairing naturally leads to a tree-like representation of t...
متن کاملLocal Alignment of RNA Sequences with Arbitrary Scoring Schemes
Local similarity is an important tool in comparative analysis of biological sequences, and is therefore well studied. In particular, the Smith-Waterman technique and its normalized version are two established metrics for measuring local similarity in strings. In RNA sequences however, where one must consider not only sequential but also structural features of the inspected molecules, the concep...
متن کامل13 Comparative RNA analysis
• R. Durbin, S. Eddy, A. Krogh und G. Mitchison, Biological sequence analysis, Cambridge, 1998 • D.W. Mount. Bioinformatics: Sequences and Genome analysis, 2001. • V. Bafna, S. Muthukrishnan, R. Ravi, Computing similarity between RNA strings. • D. Sankoff, Simultaneous solution of the RNA Folding , Alignment and Protosequence Problems, SIAM Journal of Appl. Math., 45,5,1985 • J. Gorodkin, L.J. ...
متن کامل\recent Methods for Rna Modeling Using Stochastic Context-free Grammars," Proc. Combinatorial Pattern
Ribonucleic acid (RNA) strings are strings over the four-letter alphabet fA;C;G;Ug with a secondary structure of base-pairing between A U and C G pairs in the string 1 . Edges are drawn between two bases that are paired in the secondary structure and these edges have traditionally been assumed to be noncrossing. The noncrossing base-pairing naturally leads to a tree-like representation of the s...
متن کاملHarry: A Tool for Measuring String Similarity
Comparing strings and assessing their similarity is a basic operation in many application domains of machine learning, such as in information retrieval, natural language processing and bioinformatics. The practitioner can choose from a large variety of available similarity measures for this task, each emphasizing different aspects of the string data. In this article, we present Harry, a small t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995